suppressPackageStartupMessages({
library('tidyverse')
library('plotly')
library('treeio')
library('ggtree')
library('ggtreeExtra')
library("RColorBrewer")
library("cowplot")
library("shadowtext")})

Creates supplementary table 1, with BGCs as observations

Rscript notebook/bgc_table.R \
  --antismash_dir  ~/wwtphqmags/antismash/6.0.1/ \
  --bigscape_dir ~/wwtphqmags/bigscape/wwtphqmags_antismash_6.0.1/network_files/2022-02-10_10-03-49_glocal_wwtphqmags_antismash_6.0.1/ \
  --output tables/wwtphqmags_bgcs.csv 
## # A tibble: 4,242 × 9
##    bgc_id         GCF   genome_id contig  start    end product contig_edge class
##    <chr>          <chr> <chr>     <chr>   <dbl>  <dbl> <chr>   <lgl>       <chr>
##  1 CP064957.1.re… 1911  GCA_0166… CP064… 2.67e5 3.09e5 NRPS-l… FALSE       NRPS 
##  2 CP064958.1.re… 1912  GCA_0166… CP064… 2.35e5 2.56e5 CDPS    FALSE       Othe…
##  3 CP064960.1.re… 1913  GCA_0166… CP064… 1.93e4 4.01e4 hserla… FALSE       Othe…
##  4 CP064960.1.re… 1914  GCA_0166… CP064… 1.86e5 2.07e5 terpene FALSE       Terp…
##  5 CP064963.1.re… 1915  GCA_0166… CP064… 1.30e5 1.73e5 NRPS-l… FALSE       NRPS 
##  6 CP064963.1.re… 1916  GCA_0166… CP064… 2.67e5 2.76e5 RiPP-l… FALSE       RiPPs
##  7 CP064963.1.re… 1917  GCA_0166… CP064… 1.53e6 1.61e6 hglE-K… FALSE       Othe…
##  8 CP064963.1.re… 1918  GCA_0166… CP064… 3.85e6 3.86e6 RiPP-l… FALSE       RiPPs
##  9 CP064963.1.re… 1919  GCA_0166… CP064… 4.36e6 4.38e6 RRE-co… FALSE       RiPPs
## 10 CP064964.1.re… 1920  GCA_0166… CP064… 2.23e4 4.32e4 terpene FALSE       Terp…
## # … with 4,232 more rows

Creates supplementary table 2, with MAGs as observations

Rscript notebook/genome_table.R \
  --bgcs_table  tables/wwtphqmags_bgcs.csv \
  --supplementary_file data/singleton_2021_table3.xlsx \
  --assembly_details data/assembly_details.txt \
  --output  tables/wwtphqmags_genomes.csv 
## # A tibble: 1,080 × 11
##    genome_id       total_bgcs bgcs_on_contig_edge gtdb_taxonomy   assembly_level
##    <chr>                <dbl>               <dbl> <chr>           <chr>         
##  1 GCA_016699045.1          1                   0 d__Bacteria;p_… Complete/Chro…
##  2 GCA_016705575.1          5                   0 d__Bacteria;p_… Contig/Scaffo…
##  3 GCA_016705605.1          2                   0 d__Bacteria;p_… Contig/Scaffo…
##  4 GCA_016705565.1          2                   0 d__Bacteria;p_… Contig/Scaffo…
##  5 GCA_016705545.1          4                   0 d__Bacteria;p_… Contig/Scaffo…
##  6 GCA_016705525.1          5                   1 d__Bacteria;p_… Contig/Scaffo…
##  7 GCA_016705495.1          1                   0 d__Bacteria;p_… Contig/Scaffo…
##  8 GCA_016705475.1          6                   0 d__Bacteria;p_… Contig/Scaffo…
##  9 GCA_016705465.1          3                   1 d__Bacteria;p_… Contig/Scaffo…
## 10 GCA_016705435.1          2                   0 d__Bacteria;p_… Contig/Scaffo…
## # … with 1,070 more rows, and 6 more variables: checkm_completeness <dbl>,
## #   checkm_contamination <dbl>, genome_size <dbl>, ncbi_bioproject <chr>,
## #   mimag_quality <chr>, source <chr>

Histogram

New classes and color palettes

Barplot

Boxplots

Genome size scatter plots

Pearson correlation of complete dataset

cor(x = scatter_ds$genome_size, y = scatter_ds$total_bgcs)
## [1] 0.6175123

Pearson correlation without the three phyla highlighted above

without_ds <- scatter_ds[scatter_ds$phylum == "Other", ]
cor(x = without_ds$genome_size,  y = without_ds$total_bgcs)
## [1] 0.346717

Tree of the 1080 genomes

## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

Boxplots for relevant genera